QSAR Modeling of Imbalanced High-Throughput Screening Data in PubChem

نویسندگان

  • Alexey V. Zakharov
  • Megan L. Peach
  • Markus Sitzmann
  • Marc C. Nicklaus
چکیده

Many of the structures in PubChem are annotated with activities determined in high-throughput screening (HTS) assays. Because of the nature of these assays, the activity data are typically strongly imbalanced, with a small number of active compounds contrasting with a very large number of inactive compounds. We have used several such imbalanced PubChem HTS assays to test and develop strategies to efficiently build robust QSAR models from imbalanced data sets. Different descriptor types [Quantitative Neighborhoods of Atoms (QNA) and "biological" descriptors] were used to generate a variety of QSAR models in the program GUSAR. The models obtained were compared using external test and validation sets. We also report on our efforts to incorporate the most predictive of our models in the publicly available NCI/CADD Group Web services ( http://cactus.nci.nih.gov/chemical/apps/cap).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A novel method for mining highly imbalanced high-throughput screening data in PubChem

MOTIVATION The comprehensive information of small molecules and their biological activities in PubChem brings great opportunities for academic researchers. However, mining high-throughput screening (HTS) assay data remains a great challenge given the very large data volume and the highly imbalanced nature with only small number of active compounds compared to inactive compounds. Therefore, ther...

متن کامل

QSAR Classification Model for Antibacterial Compounds and Its Use in Virtual Screening

As novel and drug-resistant bacterial strains continue to present an emerging health threat, the development of new antibacterial agents is critical. This includes making improvements to existing antibacterial scaffolds as well as identifying novel ones. The aim of this study is to apply a Bayesian classification QSAR approach to rapidly screen chemical libraries for compounds predicted to have...

متن کامل

Challenges in Secondary Analysis of High Throughput Screening Data

Repurposing an existing drug for an alternative use is not only a cost effective method of development, but also a faster process due to the drug's previous clinical testing and established pharmokinetic profiles. A potentially rich resource for computational drug repositioning approaches is publically available high throughput screening data, available in databases such as PubChem Bioassay and...

متن کامل

Benchmarking ligand-based virtual High-Throughput Screening with the PubChem database.

With the rapidly increasing availability of High-Throughput Screening (HTS) data in the public domain, such as the PubChem database, methods for ligand-based computer-aided drug discovery (LB-CADD) have the potential to accelerate and reduce the cost of probe development and drug discovery efforts in academia. We assemble nine data sets from realistic HTS campaigns representing major families o...

متن کامل

PubChem BioAssay: 2014 update

PubChem's BioAssay database (http://pubchem.ncbi.nlm.nih.gov) is a public repository for archiving biological tests of small molecules generated through high-throughput screening experiments, medicinal chemistry studies, chemical biology research and drug discovery programs. In addition, the BioAssay database contains data from high-throughput RNA interference screening aimed at identifying cri...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 54  شماره 

صفحات  -

تاریخ انتشار 2014